<div id="header">
<h1>iText<sup>&reg;</sup> XMLWorker</h1>
</div>
<h2>Introduction</h2>
<p>The iText<sup>&reg;</sup> XMLWorker is a package created for transforming XML files to PDF. Although parsing XML was already possible with iText, a new version has been created. Many developers use the XML to PDF capabilities to parse simple HTML/XHTML snippets to PDF but the support for CSS was somewhat limited. In the new XMLWorker there is better support for CSS. Initially this is done purely for parsing XHTML tags with CSS2, which is created with a wysiwyg editors (e.g. TinyMCE or CKEditor) for instance. Of course it does not end here. By using the XMLWorker it is possible to parse all kinds of XML and use CSS in them, although this requires specific implementation of the XML-tags and/or CSS-styles.</p>
<h2>Current Limitations</h2>
<p>The XMLWorker is initially created to parse snippets and absolute positioning is not yet supported. As a result, it is currently not possible to surround everything with borders for instance. For the current CSS limitations see CSS Support.</p>
<h2>Parsing XHTML/CSS snippets</h2>
<p>Parsing XHTML snippets can be done with the default implementation for parsing HTML to PDF. See code examples for usage tips.</p>
<h3>Supported tags</h3>
<table>
<tbody>
<tr style="vertical-align: top;"><th>Tag</th><th>Comment</th><th>Supported Attributes</th></tr>
<tr>
<td>xml</td>
<td>if available used for parsing charset / must be tested more deeply</td>
<td>encoding</td>
</tr>
<tr>
<td>html</td>
<td>ignored</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>head</td>
<td>ignored</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>title</td>
<td>if a document is available, the title is set with <code>document.addTitle(title)</code>.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>meta</td>
<td>parses http-equiv="Content-Type" and the charset</td>
<td>http-equiv, content</td>
</tr>
<tr>
<td>script</td>
<td>ignored</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>style</td>
<td>parsed and added to css processing</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>link</td>
<td>css is parsed and added to global styles</td>
<td>type, href</td>
</tr>
<tr>
<td>body</td>
<td>direct content in body is added</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>a</td>
<td>supported</td>
<td>href, name</td>
</tr>
<tr>
<td>br</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>div</td>
<td>direct content in div is added</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>h1 to h6</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>p</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>span</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>img</td>
<td>supported</td>
<td>src, width, height</td>
</tr>
<tr>
<td>hr</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>ul, ol, li</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>dfn, dl, dt</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>table</td>
<td>supported<br />Nested tables can work, but we advise against using them</td>
<td>width, border, cellspacing, cellpadding</td>
</tr>
<tr>
<td>tr</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>td, th</td>
<td>supported</td>
<td>width, rowspan, colspan</td>
</tr>
<tr>
<td>thead, tfoot, tbody</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>caption</td>
<td>caption element of a table is supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>sub</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>sup</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>small, big</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>b, strong</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>u, ins</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>i, cite, em, var, dfn, address</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>pre, tt, code, kbd, samp</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>s, strike, del</td>
<td>supported</td>
<td>&nbsp;</td>
</tr>
</tbody>
</table>
<h3>Known issues</h3>
<p>The implementation is not fully finished. There are a couple areas that need to be fixed/improved and are still worked on.<br />It is possible not all CSS will behave as expected, there is a lot and not every possible combination is fully tested and implemented.<br />Javascript is totally ignored at the moment.<br />The provided snippets content character encoding is taken into account, but no tests has been done with it yet.</p>
<h2>CSS Support</h2>
<p>n = not supported, f = fully supported, s = somehow supported</p>
<table>
<tbody>
<tr><th>Property<br />The CSS property (CSS2/3)</th><th>Text<br />CSS properties applicable on text</th><th>tables<br />CSS properties applicable on tables (table, td, tr)</th><th>list<br />CSS properties applicable on list (ul, ol, li)</th><th>image<br />CSS properties applicable on images (img)</th></tr>
<tr>
<td>background</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>background-attachment</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>background-color</td>
<td>f</td>
<td>f</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>background-image</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>background-position</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>background-repeat</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>border</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-bottom</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-bottom-color</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-bottom-style</td>
<td>n</td>
<td>s</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-bottom-width</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-color</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-collapse</td>
<td>&nbsp;</td>
<td>n - always collapsed</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>border-left</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-left-color</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-left-style</td>
<td>n</td>
<td>s</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-left-width</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-right</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-right-color</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-right-style</td>
<td>n</td>
<td>s</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-right-width</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-spacing</td>
<td>&nbsp;</td>
<td>n</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>border-style</td>
<td>n</td>
<td>s</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-top</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-top-color</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-top-style</td>
<td>n</td>
<td>s</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-top-width</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>border-width</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>bottom</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>caption-side</td>
<td>&nbsp;</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>clear</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>clip</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>color</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>content</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>counter-increment</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>counter-reset</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>cursor</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>direction</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>display</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>empty-cells</td>
<td>&nbsp;</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>float</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>font</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>font-family</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>font-size</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>font-style</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>font-variant</td>
<td>n</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>font-weight</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>height</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>left</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>letter-spacing</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>line-height</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>list-style</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>f</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>list-style-image</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>f</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>list-style-position</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>f</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>list-style-type</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>f</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>margin</td>
<td>f</td>
<td>f</td>
<td>s (not on li)</td>
<td>n</td>
</tr>
<tr>
<td>margin-bottom</td>
<td>f</td>
<td>f</td>
<td>f</td>
<td>n</td>
</tr>
<tr>
<td>margin-left</td>
<td>f</td>
<td>f</td>
<td>f (not on li)</td>
<td>n</td>
</tr>
<tr>
<td>margin-right</td>
<td>f</td>
<td>f</td>
<td>s (not on li)</td>
<td>n</td>
</tr>
<tr>
<td>margin-top</td>
<td>f</td>
<td>f</td>
<td>f</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>max-height</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>max-width</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>min-height</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>min-width</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>orphans</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>outline</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>outline-color</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>outline-style</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>outline-width</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>overflow</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>padding</td>
<td>f</td>
<td>f</td>
<td>s</td>
<td>n</td>
</tr>
<tr>
<td>padding-bottom</td>
<td>f</td>
<td>f</td>
<td>f</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>padding-left</td>
<td>f</td>
<td>f</td>
<td>f (not on li)</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>padding-right</td>
<td>f</td>
<td>f</td>
<td>f (not on li)</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>padding-top</td>
<td>f</td>
<td>f</td>
<td>f</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>page-break-after</td>
<td>s - only value always</td>
<td>s - only value always</td>
<td>s - only value always</td>
<td>s - only value always</td>
</tr>
<tr>
<td>page-break-before</td>
<td>s - only value always</td>
<td>s - only value always</td>
<td>s - only value always</td>
<td>s - only value always</td>
</tr>
<tr>
<td>page-break-inside</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>position</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>quotes</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>right</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>table-layout</td>
<td>&nbsp;</td>
<td>s</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>text-align</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>text-decoration</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>text-indent</td>
<td>f</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>text-shadow</td>
<td>n</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>text-transform</td>
<td>n</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>top</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>unicode-bidi</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>vertical-align</td>
<td>f</td>
<td>f</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>visibility</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>n</td>
</tr>
<tr>
<td>white-space</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>widows</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>width</td>
<td>n</td>
<td>f</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>word-spacing</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>z-index</td>
<td>n</td>
<td>n</td>
<td>n</td>
<td>&nbsp;</td>
</tr>
</tbody>
</table>
<h3>Notes</h3>
<p>When using <code>page-break-before</code> and <code>page-break-after</code> inside tags that are 1 element in PDF (like lists or tables) the outcome of adding a new page is unpredicted.</p>
<h2>Examples</h2>
<h3>Default XHTML/CSS processing (Java):</h3>
<p>The quick way</p>
<pre>	// create a document to write to
	final Document doc = new Document();
	PdfWriter.getInstance(doc, new FileOutputStream("out.pdf"));
	// make sure it's open
	doc.open();
	// read the html from somewhere
	BufferedInputStream bis = new BufferedInputStream(new FileInputStream("snippet.html"));
	// parse and listen for elements to add to the document
	helper.parseXHtml(new ElementHandler() {
		public void addAll(final List&lt;Element&gt; currentContent) throws DocumentException {
			for (Element e : currentContent) {
				doc.add(e);
			}
		}
		public void add(final Element e) throws DocumentException {
			doc.add(e);
		}
	}, new InputStreamReader(bis));
	doc.close();
	</pre>
<p>The extended setup</p>
<pre>	// Create a TagProcessor
	DefaultTagProcessorFactory htmlTagProcessorFactory = (DefaultTagProcessorFactory) new Tags().getHtmlTagProcessorFactory();
	// if needed override tag that you don't want to parse to DummyTagProcessor
	htmlTagProcessorFactory.addProcessor("img", new DummyTagProcessor());
	htmlTagProcessorFactory.addProcessor("link", new DummyTagProcessor());
	// Create a fresh configuration and set needed configuration objects.
	XMLWorkerConfigurationImpl config = new XMLWorkerConfigurationImpl();
	// Attach the default CSS to a new CSSResolver
	CssFile defaultCSS = new XMLWorkerHelper().getDefaultCSS();
	StyleAttrCSSResolver cssResolver = new StyleAttrCSSResolver();
	cssResolver.addCssFile(defaultCSS);
	// attach more CSS files if needed
	cssResolver.addCssFile(otherCssFile);
	// set the TagProcessorFactory
	config.tagProcessorFactory(htmlTagProcessorFactory).cssResolver(cssResolver)
			.acceptUnknown(true);
	// create a document
	final Document doc = new Document();
	doc.setPageSize(PageSize.A4);
	// create writer
	PdfWriter writer = PdfWriter.getInstance(doc, outputStream);
	writer.setPageEvent(new WatermarkEvent());
	// set margins for first page
	float margin = CssUtils.getInstance().parsePxInCmMmPcToPt("8px");
	doc.setMargins(margin, margin, margin, margin);
	// OPEN the document !
	doc.open();
	config.document(doc).pdfWriter(writer);
	// create the worker
	final XMLWorker worker = new XMLWorkerImpl(config);
	// attach an ElementHandler
	worker.setDocumentListener(new ElementHandler() {
		public void addAll(final List&lt;Element&gt; arg0) throws DocumentException {
			for (Element e : arg0) {
				doc.add(e);
			}
		}
		public void add(final Element e) throws DocumentException {
			doc.add(e);
		}
	});
	// Set the worker in the parser and start parsing
	XMLParser p = new XMLParser(worker);
	p.parse(new StringReader(content));
	writer.close();
	</pre>
<h2>Demo</h2>
<p>There is an <a href="http://demo.itextsupport.com/xmlworker">online demo</a> available where input can be done through a tinyMCE editor and a PDF is created from the provided input.</p>
<h2>Plans for the future</h2>
<ul>
<li>support absolute positioning, this would require an intermediate step to be added to the current transformation process.</li>
<li>Provide a way to add none <code>com.itextpf.text.Element</code> objects to the document. (e.g. a ColumnText)</li>
</ul>